SET INTERSECTION PROBLEMS: SUPPORTING 
HYPERPLANES AND QUADRATIC PROGRAMMING 



C.H. JEFFREY PANG 



Abstract. We study how the supporting hyperplanes produced by the pro- 
jection process can complement the method of alternating projections and its 
variants for the convex set intersection problem. For the problem of finding the 
closest point in the intersection of closed convex sets, we propose an algorithm 
that, like Dykstra's algorithm, converges strongly in a Hilbert space. Moreover, 
this algorithm converges in finitely many iterations when the closed convex sets 
are cones in R" satisfying an alignment condition. Next, we propose modifica- 
tions of the alternating projection algorithm, and prove its convergence. The 
algorithm converges superlinearly in 1" under some nice conditions. Under a 
conical condition, the convergence can be finite. Lastly, we discuss the case 
where the intersection of the sets is empty. 
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1. Introduction 

For finitely many closed convex sets K\, . . . , K r in a Hilbert space X, the Set 
Intersection Problem (SIP) is stated as: 

r 

Find x G K := f] K h where K ^ 0. (1.1) 



Date: January 1, 2013. 

2000 Mathematics Subject Classification. 90C30, 90C59, 47J25, 47A46, 47A50, 52A20, 41A50, 
41A65, 46C05, 49J53, 65K10. 

Key words and phrases. Dykstra's algorithm, best approximation problem, alternating projec- 
tions, quadratic programming, supporting hyperplanes, superlinear convergence. 

1 



SET INTERSECTION: SUPPORTING HYPERPLANES, QUADRATIC PROGRAMMING 2 



One assumption on the sets Ki is that projecting a point in X onto each K% is a 
relatively easy problem. 

A popular method of solving the SIP is the Method of Alternating Projections 
(MAP), where one iteratively projects a point through the sets Ki to find a point in 
K. Another problem related to the SIP is the Best Approximation Problem (BAP): 
Find the closest point to xq in K, that is, 

min ||x — xo|| (1-2) 

r 

s.t. x e K := P| Ki. 

i=i 

for closed convex sets Ki, i = 1, . . . , r. One can easily construct an example in M 2 
involving a circle and a line such that the MAP converges to a point in K that is 
not Pk(xq). Fortunately, Dykstra's algorithm Dyk83, BD86] reduces the problem 
of finding the projection onto K to the problem of projecting onto K individually 
by adding correction vectors after each iteration. It was rediscovered in [Han88 
using mathematical programming duality. For more on the background and recent 
developments of the MAP and its variants, we refer the reader to [BB96, BR09, 
lERllj , as we ll as [DeuOlbl Chapter 9] and [BZ051 Subsubsection 4.5.4]. 

We quote [DeuOlaj . where it is mentioned that the MAP has found application 
in at least ten different areas of mathematics, which include: (1) solving linear 
equations; (2) the Dirichlet problem which has in turn inspired the "domain de- 
composition" industry; (3) probability and statistics; (4) computing Bergman ker- 
nels; (5) approximating multivariate functions by sums of univariate ones; (6) least 
change secant updates; (7) multigrid methods; (8) conformal mapping; (9) image 
restoration; (10) computed tomography. See also |Deu95 for more information. 

One problem of the MAP and Dykstra's algorithm is slow convergence. A few 
acceleration methods were explored. The papers {GPR67, GK89, BDHP03] ex- 
plored the acceleration of the MAP using a line search in the case where Ki are 
linear subspaces of X. See jDeuOla] for a survey. One can easily rewrite the SIP 
as a Convex Inequality Problem (CIP): 

Find iel™ satisfying g(x) < 0, 

where g : W — > W is such that each gi : W 1 — > R, where i = 1, . . . , r, is convex: 
Just set gi(x) to be the distance from x to Ki. In the case where X = 1" and each 
gi(-) is differentiable with Lipschitz gradient, the papers |GP98[ IGPOlj proved a 
superlinear convergent algorithm for the CIP. They make use of the subgradients of 
g(-) to define separating hyperplanes to the feasible set, and make use of quadratic 
programming to achieve superlinear convergence. Another related work is |Kiw95j . 
where the interest is on problems where r, the number of closed convex sets Ki, is 
large. 

We elaborate on the quadratic programming approach. Given x\ € X and 
the projection X2 — Pk^xi), provided X2 ^ X\, a standard result on supporting 
hyperplanes gives us K\ C {x | (x\ — X2, x) < (xi — X2, £2)}- The aim of this work 
is make use of the supporting hyperplanes generated in the projection process to 
accelerate the convergence to a point in K. A relaxation of (|1.2p is 

min \\x — xo|| 2 (1-3) 

s.t. (dj, x) < bi for i = 1, . . . , k, 
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where each constraint (a, , x) < bi corresponds to a supporting hyperplane obtained 
by the projection operation onto one of the sets Ki, where 1 < i < r. Let S = 
spanja,; | i = 1, . . . , k}, and let V = {vi, . . . be a set of orthonormal vectors 

spanning S, where k' < k. We can write a, = Yli=i a i,j v j> x o = Uo + z o an d 



= J2 i=i ^j v j + z i w h crc Uo € S and zq, z € 5^. Then (|1.3p can be rewritten as 



mm 



s.t. 




2/o 



fc' 



1 20 



(1.4) 



< 6i for 



Therefore (|1.3p can be easily solved using convex quadratic programming, especially 
when k and k' are small. (See for example |NW06[ Chapter 16].) 

The quadratic programming formulation (|1.3p gathers information from the sup- 
porting hyperplanes to many of the closed convex sets Ki, and so is a good approx- 
imation to (|1.2p ; the intersection of the halfspaces defined by the supporting hyper- 
planes can produce a set that is a better approximation of K than each Ki taken 
singly. Hence there is good reason to believe that (JT72J) can achieve better conver- 
gence than simple variants of the MAP. As Figure 11.11 illustrates, the supporting 
hyperplanes can provide a good outer estimate of the intersection Ki. Furthermore, 
as more constraints are added in the quadratic programming formulation Q1.3P , it is 
possible to use warm starts from previous iterations to accelerate convergence. In 
this paper, we shall only pursue the idea of supplementing the MAP with support- 
ing hyperplanes and quadratic programming, but not on the details of the quadratic 
programming subproblem. 




Figure 1.1. The method of alternating projections on two convex 
sets K\ and K2 in R 2 with starting iterate xq arrives at X3 in three 
iterations. But the point £4 generated by the cutting planes of 
K\ and Ki at x\ and X2 respectively is much closer to the point 
x, especially when the boundary of K\ and K2 have fewer second 
order effects and when the angle between the boundary of Ki and 
K2 is small. On the other hand, the point x% is ruled out by the 
supporting hyperplane of K\ passing through x^. 



We remark that the idea using supporting hyperplanes to approximate the set 
K was also considered in [BCRZM03], but their motivation was to make use of 
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hyperplanes to simplify the projections onto the sets Ki rather than accelerating 
convergence. 

1.1. Contributions of this paper. In this paper, we prove theoretical properties 
of the alternating projection method supplemented with the insight on supporting 
hyperplanes. Sections [3] to [HI are mostly independent of each other. 

First, we propose Algorithm 13.11 for the Best Approximation Problem (|1.2p in 
Section [3J We prove norm convergence, and the finite convergence of Algorithm 
ETT1 with (|3~Tbl) when K x C R n have a local conic structure and satisfy a normal 
condition. We also show that the normal condition cannot be dropped. 

In Section 01 we propose modifications of the alternating projection algorithm 
for the Set Intersection Problem (jl.ip , and prove their convergence. We also prove 
the superlinear convergence of a modified alternating projection algorithm in R 2 . 

In Section [5l we prove the most striking result of this paper, which is the su- 
perlinear convergence of an algorithm for the Set Intersection Problem (jl.lj) in 
R" under reasonable conditions. The convergence can be finite if there is a local 
conic structure at the limit point. The proofs of superlinear convergence are quite 
different from the proof in Section FD 

Lastly, in Section [5J we discuss the behavior of Algorithm 13.11 in the case when 
the intersection of the closed convex sets is empty. 

1.2. Notation. We shall let M r (x) be the closed ball with radius r and center x. 
The projection operation onto a set C is denoted by Pc{')- We also make use of 
standard constructs in convex analysis. 

2. Some useful results 

In this section, we recall or prove some useful results that will be useful in two 
or more of the sections later. The reader may wish to skip this section and come 
back to refer to the results as needed. 

The result below shows that separating hyperplanes near a point in a convex set 
behave well. 

Theorem 2.1. (Supporting hyperplane near a point) Suppose C C K™ is convex, 
and let x £ C . Then for any e > 0, there is a S > such that for any point 
x £ [Bj(i)nC]\{i} and supporting hyperplane A of C with unit normal v £ Nc(x) 
at the point x, we have pr^gn < £■ 

Since d(x,A) = — (v, x — x), the conclusion of this result can be replaced by 
< 5; £ instead. 

Proof. We refer to Figure |2~T1 For the given ei > 0, there is some 8 > such that 
if x G B<5(a;) n C and v £ Nc(x) is a unit vector, then there is some unit vector 
v £ Nc(x) such that \\v — v\\ < e%. This means that the angle between v and v is 
at most 2sin~ 1 (ei/2). 

One can easily check that x — x is not a multiple of v. Consider the two dimen- 
sional affine space that contains the vector v and the points x and x, and project 
the point x + v onto this affine space. Let this projection be x + v' . It is easy 
to check that the angle between v 1 and v, marked as a in Figure |2~T1 is bounded 
from above by 2 sin _1 (ei/2). (The lines with arrows at both ends passing through 
x and x respectively represent the intersection of supporting hyperplanes with the 
two dimensional affine space.) 



SET INTERSECTION: SUPPORTING HYPERPLANES, QUADRATIC PROGRAMMING 5 



The angle in Figure [2~T1 is an upper bound on the angle between x — x and the 
supporting hyperplane A, and is easily checked to satisfy 9 < a. We thus have 

d(x A) 

„ v ' _,, < sin6» < sin a < sin (2shT 1 (e 1 /2)). 
\\x-x\\ 

So for a given e > 0, if e\ were chosen to be such that sin (2sin _1 (ei/2)) < e, then 
we are done. □ 




FIGURE 2.1. Diagram in the proof of Theorem 12.11 

Next, we recall Moreau's Theorem, and remark on how it will be used. For a 
convex cone C, we denote its negative polar cone by C~ . 

Theorem 2.2. (Moreau's Decomposition Theorem) Suppose C C W 1 is a closed 
convex cone. Then for any x € R™, we can write x = Pc{%) + Pc-(^)> an d 
moreover, (Pc(x), Pq-{x)) =0. 

The following result will be used in Theorems 13.51 and 15.111 

Proposition 2.3. (Projection onto cones) Suppose C C R™ is a closed convex 
cone. Then the supporting hyperplane formed by projecting a point y onto C would 
contain the origin. 

Proof. By Moreau's Theorem, the projection Pc(y) satisfies 

y = P c (y) + P c - (y) and (P c (y), P c - (y)) = 0. 
The supporting hyperplane produced by projecting y onto C would be 

{x\ (x,y-P c (y)) < {Pc(y),y-Pc(y))}, 

which equals {x \ (x, Pq- (y)) < 0}. It is clear that the origin is in the supporting 
hyperplane. □ 

3. Convergence for the Best Approximation Problem 

In this section, we discuss algorithms for the Best Approximation Problem (|1.2[) . 
We describe Algorithm 13.11 and show strong convergence to the closest point in 
the intersection of the closed convex sets (Theorem 13. 3p . Furthermore, in the finite 
dimensional case where the sets have a local conic structure, Algorithm 13.11 with 
(I3.1bp converges in finitely many iterations (Theorem l3.5p under a normal condition 
(|3.6[) . We give an example to show that the condition (|3.6p cannot be dropped. 

For each neN, let [n] denote "n mod r"; that is, 

[n] := {1,2, . . . , r} n {n - kr | k = 0, 1, 2, . . . }. 
We present our algorithm for this section. 
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andb9 ] = /" W) 



Algorithm 3.1. (Best approximation) For a point xo and closed convex sets Ki, 
I = 1, 2, . . . , r, of a Hilbert space X , find the closest point to xq in K := C\[_^Ki. 
Step 0: Leti = l. 

Step 1: Choose Ji c {l,...,r}. Some examples of Ji are 

Ji = {[»]}, (3.1a) 
and Ji = {1, ...,r}. (3.1b) 

For j 6 define x{ 3) G X, ap } G X and € K 6y 

^ = i^fo-i), 

Define the set Fi <Z X by 

Fi := jac | U\ j) ,x^ < 6p } for alll = 1, . . . ,i and j G j} . (3.2) 

Let CEj = PFi(xo). 

Step 2: Set i ■<— i + 1, and go oacfc to stop i. 

When Ji is chosen using (|3.1a|) and a;p^ G .K^ so that = Pj<- [4] (xj-i), 

then aj'*^ = and 6^ = 0, and the algorithm stalls for one step. These values 
of a!^ and b^' are still valid, though any implementation should treat this case 
separately. When the algorithm stalls for r iterations in a row, then we have found 
the closest point from xo to n[=i Ki- 

Remark 3.2. (Projecting to sets with greater second order behavior) In Step 1 of 
Algorithm 13. 1[ one needs to choose Ji. When the size of the quadratic programs 
are small and easy to solve, it would be ideal to choose Ji so that |«7j| = 1. The 
cyclic choice in (|3.1a[) is a natural choice. But as remarked in Figure ITTTl one factor 
in our strategy is the second order behavior of the sets Ki . Another strategy is to 
record the distances in the most recent projections to the set Ki, and choose Ji to 
contain the index where the highest distance was recorded. In the case where one 
of the sets Ki is a subspace (and has fewer second order effects) , the computations 
would be focused on the other sets. However, one may want to ensure that all sets 
are projected to every once in a while so that Algorithm 13. II is not fooled in regions 
where the boundary is locally but not globally affine. Possible strategies are: 

i+P 

There exists p such that for all j, M J» = {1, . . . , r}, 

i—i 

or For each I = 1, . . . , r, there are infinitely many Ji containing I. 

The following theorem addresses the convergence of Algorithm l3.ll This theorem 
can be compared to the Boyle-Dykstra Theorem [BD86 3 , which establishes the 
convergence of Dykstra's algorithm |Dyk83 . 

Theorem 3.3. (Strong convergence of Alaorithm \3. l\ ) For any starting point xo, the 
sequences {x{\ produced by Alaorithm \3.1\ using p. lap or (|3.1b[) converge strongly 
to Pk(xq). 
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Proof. We shall only prove the result for the choice (|3.1a|) . since the proof for (fXTb]) 
is similar. By considering a translation if necessary, we can let xo be 0. We can 
also assume that ^ K. The iterates xi satisfy ||xj|| < d(Q, K), so {xi} has a weak 
cluster point z. Since Xi are the closest point from to Fj, and 

F i+1 C Fi for all i, (3.3) 

we see that ||xj|| is an increasing sequence, so M := lim^oo ||xj|| exists. 

Step 1: z is actually a strong cluster point. It is clear that lim^oo ||xi|| > 
|| z || . We only need to prove that 

Nl = I™ INI, (3.4) 

i— >oo 

since this condition together with the weak convergence of the subsequence of Xj 
implies the strong convergence to z. Suppose instead that lim^oo ||xj|| > ||z||. 
Then there is some k such that ||xfc|| > \\z\\. By l|3.3p . we have, for all i > k, 

(x k ,Xi) > (x k ,x k ) = ||x fe || 2 > ||x fc ||||z|| > (x fe ,z) , 

contradicting z being a weak cluster point of {x^}. Therefore z is a strong cluster 
point of {xi}. 

Step 2: Any z is in K. Suppose on the contrary that z ^ K. Then there is 
some I* such that z ^ Ki*, or Pk^ (z) ^ z. Algorithm 13.11 generates a hyperplane 
that separates z from Ki-. The halfspace {x | (a z ,x) < b z } separates z and K, 
where for y S X, a y and b y are defined by 

a y = y - P Kl , (y),b z = {y- P Kl , (y), P Kl . (y)) ■ 

The distance D from to the intersection of halfspaces 

{x | (-z,x) < -\\z\\ 2 and (a z ,x) < b z } 

would satisfy D > \\z\\. 

Next, the variables a y and b y depend continuously on the parameter y, at y = z. 
This means that if Xj is sufficiently close to z and [i] = I*, then the distance 
d(0, Xj+i) would be sufficiently close to D. This would mean that ||xj|| > ||z|| for i 
large enough, which is a contradiction to (|3.4p . Thus z £ K as needed. 

Step 3: z — Pk(xq). To see this, observe that z E K implies that d(0, K) < \\z\\. 
The fact that ||z|| = lim^oo ||xi|| from step 1 gives d(0,K) = \\z\\. 

Thus we are done. □ 

Remark 3.4. (Reducing number of supporting hyperplanes in defining Fj) In the 
proof of Theorem [231 step 1 relies on the fact that Fj+i C Fi for all i in the choice 
of Fi in (|3.2p . If X — W 1 , then step 1 of the proof would be unnecessary, but the 
sequence {||xj — xo||} needs to be increasing in order for step 2 to work. This can 
be enforced by adding the hyperplane with normal (xo — Xj-i) through Xj-i in 
constructing Fi. To ensure that each quadratic programming problem that needs 
to be solved is easy, the polyhedron Fj can be chosen such that the number of 
inequalities that define Fi is small. One can take only the active hyperplanes in 
solving the projection problem x, = P Fi (x Q ), or by aggregating some of the active 
hyperplanes to one active hyperplane when building up the polyhedron F t . 

In the case where Kj are cones, each supporting hyperplane contains the point 
0. This means that there are no second order effects near 0, which in turn gives 
fast convergence in M. n . 
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Theorem 3.5. (Finite convergence for conical problems in K™ ) For Alaorithm \3.1\ 
with (pUb) . suppose that X = R n . Convergence is guaranteed by Theorem \3.3[ 
Suppose Pk(xq) = X, and Kj are such that 

For some e>0, [Kj ~ x] H IB = T K . (x) n el for j = 1, . . . , r, (3.5) 



Then Algorithm\jn\with jXTb)) converges to x in finitely many iterations. 

Proof. We can assume x — 0. Suppose on the contrary that the convergence to 
requires infinitely many iterations. We seek a contradiction. Let {x{\ be the 
sequence generated by Algorithm 13.11 and let {xi} be a subsequence such that 
\\xi — 1 1 < e for all i, and lim^oo r£hr exists, say x. 

Step 1; x lies inT^O). Suppose on the contrary that x ^ T^-(O). Then x ^ Kj 
for some j. Assume without loss of generality that j = 1. Let P Tk ^^{x) — z, and 

x - z S Nk^O). Let Vi = -p^y - Pt Ki (o) (pTTf) ancl v = x - z. By the continuity 
of the projection, we must have Vi — > v. Since the hyperplane {x \ (x,v) = (z,v)} 
separates x from K\ and (z, v) = by Moreau's Theorem, we have (x,v) > 0. 

Let y be any point in eB, taking into account (|3.5p and x — 0. By Moreau's 
Theorem (See Proposition ^. 3[) . the supporting hyperplane produced by projecting 
y onto K\ contains on its boundary. By the design of Algorithm 13.11 we must 
have (xi+i,Vi) < 0, which gives 



Taking limits, we get (x,v) < 0, which contradicts (x,v) > earlier. 

Step 2: x, cannot lie in Tk(Q)- Suppose otherwise. Then the condition (|3.6p 
implies that if x € K and i is large enough, then d(xo, 0) < d(xo, Xi), contradicting 
that d(xo 7 0) > d(xa,Xi) in the choice of Xi. 

The statements proved in Steps 1 and 2 are clearly contradictory, which ends 
our proof. □ 

In view of the above result, we would expect Algorithm 13.11 (especially with 
(|3.1bp ) to converge quickly to the closest point under condition (|3.6p . 

The number of iterations needed before convergence depends on, among other 
things, the e. In the case where Kj are cones and (|3.6p does not hold, step 2 in the 
proof of Theorem 13.51 may fail, and there may be no finite convergence. We give an 
example. 

Example 3.6. (No finite convergence) Consider X = K 3 . Consider the rays 

ri = K+(l, -1, -1) and r 2 = K+(-l, -1, -1). 

For a vector v, let 9\ be the angle ri makes with v 7 and let di be similarly defined. 
Let K\ and K 2 be the ice cream cones defined by 



Let xq = (0, 0, 1). A few consequences are immediate. 

(1) The ray R+(0, -1, 0) is on the boundaries of K x , K 2 and K := K x n K 2 . 



and 




(3.6) 




Ki = {v\ cos(0i) < 1/V3} for i = 1,2. 



(2) P K (x a ) =0. 
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(3) There is only one unit vector in iV^ ((0, — 1,0)), say u. Let the subspace 
Sbe{x\ {l,0,0) T x = 0}. Then 

[k+M + {(o, -i,o)}] ns = (o, -i,o). 

A similar statement holds when K\ is replaced by K^. 
We now show that the convergence of Algorithm 13.11 with (|3 . lb[) is infinite. By 
symmetry, the iterates Xi lie in S. If Algorithm 13 . II with (|3.1bj) converges in finitely 
many iterations, then property (3) would imply that the next to last iterate is of 
the form (0,— a, 0), where a > 0, and that cannot happen. In the case where 
xq = (0, e, 1), where e > is arbitrarily small, we will still get finite convergence to 
0, but the number of iterations needed will be arbitrarily large as e \ 0. 

4. Convergence for the Set Intersection Problem 

In this section, we analyze a modified alternating projection algorithm (Algo- 
rithm [4TT]). The global convergence of this algorithm is proved in Theorem l4.5l The 
insight on supporting hyperplanes allows us to obtain local superlinear convergence 
in K 2 , although Algorithm l4.1l in its current form does not converge superlinearly in 
R 3 (Example HIT]) . A locally superlinearly convergence algorithm will be presented 
and analyzes in Section [5] using very different methods. 

We shall analyze the following algorithm. 

Algorithm 4.1. (Modified MAP) For a point xq and closed convex sets K\ and 
Ki of a Hilbert space X , find a point in K := K\ (~1 K^- 
Step 0: Seti = l. 

Step 1: Choose Ji c {1,2}. Some examples are 

Ji = {[*]}, (4.1a) 
andJi = {1,2}. (4.1b) 

Step 2: For j G Ji, define x ( f } £ X, a\ j) £ X and b^ £ X by 



andbf = U j \x\ j) 



Define the set Fi C X by 



Fa 



a^\x 



c) < bf ]) forl = i- if Ji = {[i]} and i > 1, 

I Ui\x) < b^X if Ji = {[i]} and i = 1, 

[a<?\x) < b<f } for j = 1, 2} if Ji = {1, 2}. 



Let x i = P Fi (xf il) ). 

Step 3: Set i <— i + 1, and go back to step 1. 



As mentioned in Remark 13.21 there are good reasons for choosing Ji to be such 
that j Jj| = 1 but not cyclic, but the construction of Fi has to be amended accord- 
ingly. It may turn out that Xi could be in K\ already, so P^ixi) will not give a 
new supporting hyperplane. In this case, we can just use the supporting hypcrplanc 
obtained from previous iterations. When Ji = {1,2}, we can check that Xi lies in 
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the plane containing Xi-i, and x^\ and that Xi = Pp i (xf l ") = Pp i (xi-i). We 
shall prove the super linear convergence of this case in R 2 in Theorem 14.61 

We now recall some results on Fejer monotonicity to prove convergence of Algo- 
rithm [O] We take our results from [BZ051 Theo rem 4.5.10 and Lemma 4.5.81. 

Definition 4.2. (Fejer monotone sequence) Let X be a Hilbert space, let C C X 
be a closed convex set and let {x{\ be a sequence in X. We say that {x{\ is Fejer 
monotone with respect to C if 

ll^i+i — c ll < \\ x i — c ll f° r all c G C and i = 1,2,... 

Theorem 4.3. (Properties of Fejer monotonicity) Let X be a Hilbert space, let 
C C X be a closed convex set and let {x^ be a Fejer monotone sequence with 
respect to C . Then 

(1) {xi} is bounded and d{C,Xi+\) < d(C,Xi). 

(2) {xi} has at most one weak cluster point in C . 

(3) //int(C) 7^ 0, then {xi\ converges in norm. 

Lemma 4.4. (Attractive property of projection) Let X be a Hilbert space and let 
C C X be a closed convex set. Then Pc '■ X — > X is 1-attracting with respect to 
C : For every x C and y € C , we have 

\\P c {x)-xf <\\x-y\\ 2 -\\P c (x)-y\\ 2 . 

We now prove the convergence of Algorithm 14.11 

Theorem 4.5. (Convergence of Algorithm \4-L\ ) Suppose K\ and Ki are closed 
convex sets in a Hilbert space X such that K := K\ D K% 7^ 0- Then the iterates 
in Algorithm \4-l\ with either (14. lap or (|4.1b[) are such that Xi converges weakly to 
some z. The convergence is strong if either int(A") 7^ or X = K n . 

Proof. We shall first prove convergence when Ji is chosen by (|4.1ap . We note that 
Algorithm 14. II can be easily extended to the case of r > 2 closed convex sets, and 
the corresponding extension of this result will still be true. 

The sequences ano - {2-2^}* ^ e m -^1 an d 2 respectively. Construct the 

sequence {xt} such that 

{Xj if i = 2j 

x 2j+i if i = 4j + 1 
4'+2 5fi = 4j+3. 

Note that {xt} lines up the points in {xi} and {x\ } in the order in which they 
were produced in Algorithm 14.11 

Step 1: \xj} is Fejer monotone with respect to K. Since K C K\ , K C AT2 
and K <Z Fi for all i, the projections Pkh Pk 2 an( i fti are nonexpansive. So 

\\x^ -y\\ = \\P K& {x^ 1 )-y\\<\\x i ^-y\\ 

and \\xi - y\\ = WPf^x^) - y\\ < \\x[ [i]) - y\\ for all y G K and i > 1. 

This means that {ii} is a Fejer monotone sequence with respect to K. 
Step 2: \xj} is asymptotically regular, i.e., 

lim \\xi - x i+ i\\ = 0. 
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Fix any y € K. Applying Lemma l4.4[ we get 

lk (H) - ^-ill 2 = 11^,(^-1) - ^-i|| 2 < \\xi-i -y\\ 2 - ||xf ]) - y\\ 2 
and \\ Xi -x^f < ||x| W) -yf - \\ Xi - y\\ 2 for all i > 1. 

This tells us that — £i-i|| 2 < ||^j-i — y\\ 2 — — y\\ 2 for all i > 1. Since {xi} 
is Fejer monotone with respect to K, \\xi — y\\ 2 is a decreasing sequence. We thus 
have the asymptotic regularity of {x{\. 

Step 3: Wrapping up. By Theorem I4.3f 1), the sequence {x^} is bounded. 
So {Sn} has a convergent subsequence, say {xi k }k- By the asymptotic regularity of 
{xi}, the sequence { X i k +i}k has the same limit as {xt k }k, so we can take a different 
subsequence if necessary and assume that infinitely many of the ik are odd. We can 
choose yet another subsequence of {i£i k } if necessary so that all terms are in either 
Ki, or all terms are in K 2 . For the sake of argument, assume that all terms lie in 
K\. So the weak limit of {xi k }k, say x, lies in Ki. By the asymptotic regularity 
of {xi k }k and considering {x ik+ 2}k, we see that x € K 2 - So the weak cluster point 
must lie in K. By Theorem I4.3f 2). we conclude that {x^ converges to a point in 
K. The last sentence of the result follows from Theorem 14.3^ 3). 

For the case of using (|4.1b[) . the steps are very similar, so we only give an outline: 
One proves that the sequences {xi} and {a;^} are Fejer monotone with respect to 
K for j — 1,2. Next, the sequence xo, x\ , Xi, x\ , x 2 , ■ ■ ■ is asymptotically regular, 
which implies that the sequences {x^} and {xp'} have the same weak cluster points. 
Since j is arbitrary, the weak cluster points must lie in K , and by Theorem l4.3f 2). 
such a weak cluster point is unique. □ 

The problem of whether the MAP can converge strongly in a Hilbert space has 
only been recently resolved to be negative in [HunQ4] , so it remains to be seen how 
Theorem 14.51 can be strengthened. 

We now move on to the fast local convergence of Algorithm l4.f I Even though the 
result below is only valid for R 2 and a result establishing superlinear convergence 
for 1" is presented in Section Theorem l4.tj| has value because the proof is simpler 
than and very different from the proof in Section and the assumptions needed 
are quite different. 

Theorem 4.6. (Superlinear convergence in R 2 ) Suppose K\ and K 2 are closed 
convex sets in R 2 such that 

(1) Algorithm \4-l\ with (|4. lb[) converges to a point x, such that dNx 1 (x) n 
d[-N K2 (x)} = {0}, and 

(2) There is some iterate Xi such that Xi (£ int(Kj) for j = 1,2. 

Then the sequence {xi} thus produced converges locally superlinearly to x. 

Proof. We refer to Figure |4~TI Let cq^ be the angle between Xi — and x — x^\ 
and let a\ 2 ^ be similarly defined. As i — > oo, the points x^ and x[ 2 ^ converge to 
x, so Theorem 12.11 savs that the angles and converge to zero. 

Let 6i be the angle between x\^ — Xi and x^ — Xi as marked. Since 8Nk x (x) fl 
9[— Nk 2 (x)] = {0}, the angle 9i is bounded from below by > 0. It is also easy 
to check that if xi ^ int(Kj) for j = 1,2, then the same property holds for all i 
afterward. 
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Figure 4.1. Diagram for the proof of Theorem 



The points p\ and P2 are obtained by projecting Xi onto the line segments [x\ , x] 
and [x\ ,x\. To show that {x{\ converges superlinearly to x, it suffices to show 
that 

lim „ II* ~ g ll „ = 0, (4.2) 



since — sc|| > \&%-\ — Xi\\. Let L, ; = 



equals Li sin.'yp^, where j!- 1 ' is some 



- Xi\\ 

By the sine rule, the distance — sEj 
angle in the interval [Q,w — 6i\. The distance \\pi — Xi\\ can be calculated to be 
bounded above by Li sina,[ 1 ' ) sin.74 , while the distance \\p2~ Xi\\ is easily computed 

(2) (2) (2) 

to be bounded from above by Li sin a\ sin 7 4 , where 7, is similarly defined. The 
distance \\xi — x\\ is easily seen to be the diameter of the circumcircle of the cyclic 
quadrilateral with vertices Xi, x, p\ and p2- The angle between p\ ~ Xi and p2 — X{ 
is easily calculated to be n — 9 t + a + a ■ 2 ' ) . (Note that xf^ , x\ and p\ need not 
be collinear.) The distance of \p\ — pi\ can be estimated by 



< 



Pi 



ill + \\P2 ~ Xi\\ 



The value 



so we have 



< sin(min{7r/2, n — #j})[sin oq 1 ' 4 
x\\ can be obtained by the sine rule to be 
Ibi -P2II 



(2)l r 



sin(7r — 6i 



„(i) 



ill < 



sin(min{7r/2, 7r — #.;}) 



" sinfr - 0j + a,^ + af ; ) 
Thus to prove that (|4.2|) . it suffices to prove that 



sm 



lim 

»- >0 ° sin([7r - 4 ] 
We have shown that lim inf ,_>., 



(ndn{7r/2, 7T - fli}) 



„(i) 



sma 



sm a 



(i) 



(i) 



sm a 



(2) 



„(2) 



a- ' +a 2 - 
.04 > 6 > 0. The limit 



= 0. 



(4.3) 



of a^ 1 ' and a^ ZJ are zero and 9i 6 [6*, w] for all i. Hence we are done. 

The superlinear convergence in Theorem 14.61 does not extend to R 3 however, 
even when K\ and K% are linear subspaces. 

Example 4.7. (No superlinear convergence in R 3 for Algorithm 14. l[l We give an 
example of subspaces K\ and K2 in R 3 such that dNK ± {x) H 9[— iV^ 2 (5)] = {0} 



,.(2) 



holds because the limits 



□ 
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but there is no super linear convergence to x in Algorithm 14 . 1 1 using (|4.1b|) for some 
starting point. Consider K\ and K2 defined by 

K x = R(1,0,1), 
and K 2 = {(x, y, 0) | x, y G K}. 

For the starting point xo = (4,-1,0), we compute the iterates of Algorithm 14.11 
We calculate 



T (i) 
x i 

„(2) 



= Pk 1 {x ) = (2,0,2), 



X1 = 

4 1} = p JCl (i 1 ) = Q,o,i 

(2) _ 

/16 -4 \ 4 
and,, = {^, W 0)=-x . (4.4) 

To verify that x\ and #2 are the correct iterates, we can check that xq, x± , x^ 
and X\ lie in the plane {x | (l,2,0) T ir = 2}, and that x\, x% , a;^ and X2 lie in 
the plane {x \ (4, —1, 0) T x — 4/5}. Another condition helpful for the verification is 
that 

-xf-\x i - l = for i = 1,2. 



From (|4.4|) , we see that the convergence to zero of Algorithm 14.11 using (|4.1b[) is 
linear and not supcrlinear. But the rate of convergence for our choice of starting 
iterate is for every four projections, which is more than twice as fast of the rate 
of j for every four projections for the usual MAP. 

We show that if there were more supporting hyperplanes used in approximating 
K , then we get finite convergence to zero for this example. The projection of X4 
onto generates the supporting hyperplanes 

{x I (2, —1, 2)x = 0} if i is even, 
and {x I (1, 4, — l)x = 0} if i is odd. 

The projection of any point of the form (t, 0, t), where t > 0, onto the set 



x < 



is equal to the zero vector, which is the only point in K . 

5. SUPERLINEAR CONVERGENCE FOR THE SET INTERSECTION PROBLEM 

Our main result in this section is Theorem 15.111 where we prove the superlinear 
convergence of an algorithm for the Set Intersection Problem (ll.l[) when the normal 
cones at the point of intersection are pointed cones satisfying appropriate alignment 
conditions. 

We first describe our algorithm for this section. 
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Algorithm 5.1. (Mass projection algorithm) For a starting iterate xo and closed 
convex sets Ki C R™, where 1 < I < r, find a point in K := 0\ =1 Ki. 

Step 0: Set i = 1, and let p be some positive integer. 

Step 1: Choose = {1, . . . , r}. 

Step 2: For j G J i; define x^ S R n , a\ j) e 1" and b\ j) e R" &y 

4 j) = Fa-(^-i), 

andb ( p = (aP,xP 
Define the set F t C E™ &y 

Fi := jx | ^a ; 0) , < b^ for 1 < j < r, max(l, i - p) < I < i\ . 
Let Xi = Pp t (xi-i). 

Step 3: Set i <— i + 1, and go back to step 1. 

The modifications in Algorithm 15.11 from Algorithm 14.11 are that we set X = M. n , 
the number of sets r is arbitrary, and the set JFJ approximating K is created using 
more of the previous separating halfspaces produced earlier. 

Algorithm 15.11 produces a sequence {xi} Fejer monotone with respect to K and 
converging to a point x G K. The proof is an easy adaptation of that of Theorem 

EH 

We recall a well known fact about convex cones. 

Proposition 5.2. (Convex cone decomposition) A closed convex cone C C K n can 
be written as the direct sum C = L [L n C] , where L is the lineality subspace of 
C and L 1 - DC is a pointed convex cone. 

As a consequence of Proposition [5721 we have the following result on the normal 
cones of convex sets. We denote the lineality space of a convex set C by lin(C). 
The affine space spanned by C is denoted by aff-span(C). 

Proposition 5.3. (Lineality spaces of normals of convex sets) Suppose C C R™ 
is a convex set. Then for any x £ C, [aff-span(C)] ± = lin(Ac(a;)). In particular, 
aff-span(C) fl Nc(x) is a pointed convex cone. 

Proof, v E lm(N c (x)) <^=> ±v € N c {x) <^=*> (v,x-c) = for all c e C 
^ «G [aff-span(C)]- L . □ 

The following result shows that under certain conditions, the directions from 
which the iterates converge to the limit must lie inside the normal cone of K at the 
limit. 

Lemma 5.4. (Approach of iterates to x) For the problem of finding a point x G K , 
where K — C\ r l=l Ki and Ki C R™ are closed convex sets, suppose Algorithm 15. Jl 
produces a sequence {x{\ that converges to a point x G K and is Fejer monotone 
with respect to K . Assume that: 

(1) // J2l=i v i — f or som e v i £ Nki (x), then vi = for all I = 1, . . . , r. 
Then provided none of the Xi equals x, we have 

lim = 1. (5.1) 
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Proof. Condition (1) and |RW98| Theorem 6.42] imply that 

r 



(5.2) 



By the way Algorithm 15.11 is designed, the KKT conditions for the problem of 
projecting Xi—i onto the polyhedron to obtain Xi give 



1=1 fc— max(l,i— p) 



where X^'^vf + wj 1 '® 



is a multiple of the vector — Xk~i — Pki {xk-i) 



„(i.fc) 



[aff-span(i ; C;)] J -, v£ is a unit vector in aff-span^^iHA^ {Pr x ( x k~i)), and a| 1 '^ > 0. 
(The relationship [aff-span(i4';)]" L = \m(NK t (Pki (xk-i))) follows from Proposition 
) For j > i, we can write 



J r 



X<i—i Xj 



- EE E [a, 

s=i i = l fc=max(l, s—p) 



E 



E E ar'tf 

s=i fc=max(l,s-j)) 



,.O.fc)l 



where w\ % 



E [aff-span(ir z )]- L . Let v[ h:>) € E" be the vector 



<Lis=i Z^fe=max(l,s-p) A Z "i 



V- 7 ' V s 



s—i Z_-/fc— max(l,s— p) 

fc\ oo 
/ Lk=l 



(s, fe )„,fc 



"I 



(5.3) 



Claim 1: All cluster points of {f/ fc }^_i lie in aff-span^;) n iV^ (a:) for 

i = 1, . . . , r. This claim is clear from the outer semicontinuity of the normal cone 



mapping. 

Claim 2: The infinite sum 



exists as a limit for 1=1, 



E E 

s—i fc-max(l,s-p) 



(5.4) 



. , r. Hence lirn ? _ i . 00 j),^'^ exists. 



Suppose on the contrary that Zi t i does not exist as a limit for some I, 1 < I < r. 
It follows that 



A; = oo, 



(5.5) 



E E 

s— i fc— max(l,s— p) 

because if the sum in (|5.5[) were finite, Z;^ would exist as a limit. Note that the 
cone aff-span(i'Q) n Nki(x) is pointed. Using Claim 1 and Proposition 15. 7f l). the 
subsequence {v\ t '^} c ? 1 has cluster points in aff-span(i'Q) n Nki(x). Let 



max 

KKr 



E E M 

s=i fc=max(l,s— p) 



\W\ n 
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We have limsup^^ aij — oo since (|5.5p holds for some Z, and by Proposition [5J3 
there is a constant m dependent only on aff-span(i'Q) n Njc^x) such that 



E E 

s—i k— max(l,s— p) 

Consider the equation 



> m 



E E 



A 



( s ,fe) 



s—i fc-max(l,s-p) 



1 r r 1 / j s 

—[^i - ^] = — E E 



r,(i-l,i) 



s— i fc-max(l,s-p) 



a, 



(5.6) 



It is clear that the LHS converges to zero as j — !> oo. We can choose a subsequence 
such that the limits tij := hra^oo and t'n : = lim^oo i ., where ti^j and 

i j are defined in (|5.6|) . exist and are not all zero for 1 < I < r. This would 
contradict Condition (1), ending the proof of Claim 2. 

In view of Claim 2, define 



w, w := hm v, . 

j->oc 

Define the matrix € R™ xr whose Zth column is Uj . We can write 



(5.7) 



( = 1 s=i fc=max(l,s — p) 

where 7 (lj) e R r is such that 7 p } := || Ei=i Efe=m ax (i )S -j>) ^ '/ 
1, . . . , r. Let := linx,.^ and 7W g R r be such that 



(5.8) 



^A, (s,fc Vll for Z 



7, W := N,. 



E E 

s=i fe=max(l,s— p) 



Then 



a«7 w =E^ = EE E 



^— 1 s—i fc=max(l,s- 



Let 



.4 



{A e R" xr I The Zth column of A is a 

unit vector in aff-span(i'Q) n Nk, (x) for 1 < Z < r}, 

r 

P| aff-span(ifi), 
i=i 



and j3 := inf 



A e A and 7 S R r \{0} satisfies 7 > 



(5.9) 



Claim 3: /3 > 0. Suppose otherwise. Then there are sequences of matrices 
A® e A and unit vectors 7W € R r such that 7W > and P^iM^M) -> as 
i 00. By taking cluster points of and 7'*', we obtain P^A'y) = for some 
A e A and 7^0, where 7 > 0. This contradicts Condition (1), so Claim 3 is 
proved. 
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Claim 4: lim i _ i . 00 [inf J 4e^ ||^4 — A^\\] = 0. The lib column of A W is the unit vec- 
tor Vj as defined in (|5.3p and (|5.7p . and each vf lies in aff-span(i'Q) n Nx t (Mg(x)), 
where S = ||xfc_i — Since {x{\ converges to x and is Fejer monotone, for any 
S > 0, we can find i' large enough so that \\xi — x\\ < S for all i > i'. This would 
mean that for all e > 0, we can find i large enough so that each v\ in the sum ()5.3|) 
satisfies II v, — 



< e for some unit vector i> ; S aff-span(i'Q) n Nk,(x) 



Let 



s=j Lt=max(l,s-p) A 2 V l 



Z^s=i Z^fc=max(l,s-p) A i 



Recall that 5j is the Zth column of A^- 1 . By Proposition ^. 7f 2). there is a constant 
m dependent only on aff-span(iQ) n Nx t (x) such that 



■*(*') 1 



< em. 



Since e \ as i / 00, we can see that the conclusion to Claim 4 holds. 

Since 7W > 0. It is clear from the definition of /3 and Claim 4 that if the 7 W's 
are nonzero, then 

|P £ (AW 7 W)|| 



lim inf • 



>j8. 



To prove that the conclusion (|5.1[) holds, it suffices to prove that 



lim 



inf ■ 

Ae.4 



= 0. 



(5.10) 



(5.11) 



The reason why (|5 . 1 1|) is sufficient is as follows. The vector 7W has nonnegative 
components, and Xi-i — x — AW 7 W + Ya=i 



for some wf' G [aff-span(iT;)]- 



Then A 7 W + Ya=i ^1 would lie in Nk{x) for any A G .A by 

In the case where are zero, the numerator in (|5.11|) is zero, so things are 
straightforward. So we shall look only at the subsequence for which 7W are nonzero. 
(We do not relabel.) For the denominator, we have — x|| > ||Pl(AW 7 W)||. 

Then Claim 4 and (l5TTUj) imply 

\AyW _^W 7 W||\ . inf 



< lim 



inf ■ 

AeA 



< lim 



\\A-A^\\ 

P 



= 0. 



Pi(A« 7 W)|| 
from which (|5. 1 1|) follows easily. 

Proposition 5.5. (Intermediate estimate) Suppose Vi and w 2 are vectors in 
such that Jl^fJl < p. Then ||^ - ^|| < 2/3. 

Proof. We have 



□ 



«1 


«2 


< 




^2 


+ 




"1 


IMI 


Nil 


IMI 


IMI 




IMI 


IMI 



< 



< 



p- 
p- 



Ml I IMI -IM 



\ v i 



\v1\w\v2 

' v 2 " 



V2 



< 2/3. 



□ 
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Proposition 5.6. (Pointed cone) For a closed pointed convex cone K C K n , there 
is a unit vector d in K + , the positive polar cone of K , and some c > such that 
B c (d) C K + . For any unit vector v G K, we have d T v > c. 

Moreover, suppose Xi > and Vi are unit vectors in K for all i and Eta ^i v i 
converges to v. Clearly, v S K. Then \\ E^i XiVi\\ > cEi^i Xi, which also implies 
that Ef^i Xi is finite. 

Proof. For the unit vector v € K, we have (d—cv) G which gives (d-cv) T v > 0, 
from which the first part follows. 
Next, 



and the second part follows. 



□ 



Proposition 5.7. (Limit estimates involving pointed cones) Suppose {i>j}j are tmii 
vectors in R™ and {Ai}i is a sequence of nonnegative numbers such that the cluster 
points of {vi}i belong to a closed pointed convex cone K C K™. Then 

r^ =1 — > belong to K . 

(2) Tafce c > to be the constant in Proposition \5.b\ If Ej=i Ajfj *' s convergent 
and there are unit vectors Vi € K such that ||u, — Uj|| < e, then 



i=1 XiVi \^ l=1 X t v t 



\T,Zi x * v i\\ IIESi-Mil 



2 

< -e. 

c 



(5.12) 



Proof. Statement (1): Since the cluster points of {vi} belong to K, for any e > 0, 
we can find I e such that \\vi — vAl < e for some Vi € if. Then 



Next, Proposition 15 . 61 implies that ||Ei=i 



>cELi A - So 



111 


AjUi - EL 


i X l v l 









< 



EU a, 



E-=i 1 2A i 



Proposition 15.51 gives 

ELl X i v i 



ELi A*** 



ELi x ^ 



ELi a*s* 



cELi^ 
EU V + Efc^A, 



< 2 



cELiA, 



The RHS of the above can be made arbitrarily small since e can be made arbitrarily 

Y 3 _ XiVi 

small and j can be made arbitrarily big. The term nf^j- 1 n- belongs to K, so 

l|E I= i A ^|l 

Statement (1) holds. 

Statement (2): First, since ESi AzVi is convergent, Proposition 1 5 . 61 implies 



£a< 



> c^2 X h 
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which also implies that ^» ^ s finite, and 



Then 



\\ZZiW<\\ 



< 



< 



By Proposition we get the conclusion (|5.12p as needed. 



□ 



Next, we give conditions for estimating the distance to the point of convergence 
using the distance to the respective sets. We recall the definition of local linear 
regularity. 

Definition 5.8. (Local metric inequality) We say that a collection of closed sets 
Ki, I = 1, . . . , r satisfies the local metric inequality at x if there are j3 > and 5 > 
such that 



d(x,C)l =1 Ki) < p max d(x,K t ) for all x € M s (x). 

KKr 



(5.13) 



In this paper, we shall only consider the case where Ki are all convex. The term 
linear regularity is used in two different ways in |Kru06l after (15)] and |LLM09| 
Proposition 2.3], so we refrain from using the term here. A concise summary of fur- 
ther studies on the local metric inequality appears in [Kru06] , who in turn referred 
to |BBL99| IfofOOl INT01I INY04] on the topic of local metric inequality and their 
connection to metric regularity. Definition 15.81 is sufficient for our purposes. The 
local metric inequality is useful for proving the linear convergence of alternating 
projection algorithms [BB93, L LM09) . See |BB96j for a survey. 

With the additional assumption of local metric inequality, we have the following 
result. 

Lemma 5.9. (Estimates under local metric inequality) Let Ki C K™, where 1 < 
I < r, be closed convex sets. Suppose a sequence {xi} converges to the point x € 
K := C\ r l=1 Ki, {-fQ}[ = i satisfies the local metric inequality at x, and 



\\PN K (x)(xj -x)\\ = 



Then there is a j3 > such that 



\\ x i — x\\ < fJ max d(xi,Ki) for all i large enough. 

KKr 



(5.14) 
(5.15) 



Proof. By Moreau's Theorem, we have 



\\PT K (s)(xi -x) || 2 = || 
I Pt k (x) fa -x) || 



= lim 1 - 



lim 



Let x i be such that Xi — x — Pn k (x){xi — x), and Xi 
(|5T4|) and ([536]) give us 



- \\PN K (x)(Xi - X)\\ 

\Pn k (x)(x 1 - x)\\ 



= 0. (5.16) 



\\Xi - x\r 

-Xi = PT K (x){xi — x). Formulas 



lim 



= 1 and lim 



= 0. 



(5.17) 
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Since xi — x £ Nk{x), we have d(xi, K) = — sc|| . So, by the Lipschitzness of the 
projection operation, we have 

d(xi,K) - \\xi - Xi\\ < d(xi,K) < d(xi,K) + \\xi - x l \\ 

\\xi - x\\ - \\xi - Xi\\ < d(xi, K) < \\xi - x\\ + \\xi - Xi\\. (5.18) 

The formulas (|5 . and (|5.18|) give limi_ i . 00 "^""^j = 1. Together with the defini- 
tion of local metric inequality (|5.13|) . we can obtain what we need. □ 

Local metric inequality follows from Condition (1) in Lemma [5.41 We paraphrase 
the result from I.I.M09 . where the authors remarked that the theorem is well 
known. For example, a globalized version appears in the survey [BauOli Theorem 
3.7] without attribution. 

Lemma 5.10. (Condition for local metric inequality) Suppose x G K, where K = 
n[ =1 if/ and Ki C W 1 for 1 < I < r, and that Condition (1) of Lemma \5.4\ holds. 
Then {K{\\ =1 satisfies the local metric inequality at x. 

Proof. In [LLM09, Section 3], it was proved that if Condition (1) of Lemma 
holds, then there is a constant n > such that 



l(x,f}(l 



Ki — Zi) \ < k //J d 2 (x, Ki — Zi) for all (x, z) near (x, 0), 



This is easily seen to be stronger than the conclusion since we only need Zi = for 
1 < i < r. □ 

We state the key result of this section. 

Theorem 5.11. (Superlinear convergence) Consider the problem of finding a point 
x £ K, where K — nJ =1 Ki and Ki C M n . Suppose Algorithm 15.11 "produces a 
sequence {xi} that converges to a point x € K . Suppose also that the conditions in 
Lemma \5.4\ hold, i.e., 

(1) If^2i = i Vi = for some vi £ Nki(x), then vi — for all I = 1, ...,r. 
If p in Alaorithm \5.1\ is sufficiently large, then we have 

limsup =0 . (5.19) 

i— >oo \\Xi X\\ 

Moreover, for that choice of p, if 

for some e > 0, [Ki - x] n eB = T Kl (x) n el for all I = 1, . . . , r, (5.20) 
then the convergence of {xi} to x is finite. 
Proof. In Algorithm 15. 1[ let li G {1, . . . , r} be such that 

L £ arg max llxj — Pk, (xi)\\ = arg max d(xi,Ki). 

l<l<r l<l<r 

Let v* be the unit vector v* := -n 5— '—, — nr- In other words, v* is the unit vector 

of the hyperplane that separates Xi from Ki i . 

Without loss of generality, suppose that x = 0. Suppose (3 > is chosen such 
that (|5.15|) holds. From Lemma [5.11)1 we deduce that satisfies the local 

metric inequality at x. 

The sphere := {w E W 1 \ \\w\\ = 1} is compact. Suppose p is such that we 

can cover S" 1-1 with p balls of radius ^ . 
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Next, among the vectors {v* , v* +2 i ■ ■ ■■> v t+p\i there must exist j and k such that 
i<j<k<i+p, and v* and v k belong to the same ball of radius ^ covering 
S 1 ™ -1 . We thus have \\v* — v k \\ < j*. We can assume, using Theorem 12. 11 that i is 
large enough so that 

vfx k <e\\xj\\. (5.21) 
On the other hand, if i is large enough, we can apply Lemma 15.91 to get 

Vj X k = V k X k + (Vj - v k ) x k 

> d(x k K lk ) - ±\\x k \\ 

> ^||*fc||. (5-22) 

The methods in Theorem 14 . 5 1 can be easily adapted to prove that the sequence {xt} 
is Fejer monotone with respect to K. The inequalities (|5.21|) and (|5.22p . and the 
Fejer monotonicity of {x{\ combine to give 

IN+p|| < \\x k \\ < 2/3e\\x j \\ < 2pe\\xi\\. 

As the factor e can be made arbitrarily close to 0, we proved (|5.19p . 

Next, under the added condition (|5.20p . the formula (|5.21j) becomes v* T Xk < 
instead by an application of Moreau's Theorem (See Proposition ^. 3[) . and the same 
steps show us that ^j||a;fc|| < 0, which forces x k = 0, or x k = x. □ 

Even though the choice of p in the proof of Theorem 15.111 is impractical, Theo- 
rem [5jTj gives justification that the idea of supporting hyperplanes and quadratic 
programming can lead to fast convergence. 

5.1. Alternative estimates. We close this section with a result that might be 
helpful for estimating the distance of an iterate to the limit x. 

Lemma 5.12. (Alternative estimate) Let K := nJ =1 Ki, where K\ are closed convex 
sets in W 1 for 1 < I < r. Let hyperplanes Hj := {x \ (aj,x) = bj}, points aj S R n 
and Xj € M™, where j = 1, . . ., J, be such that \\aj\\ — 1. Suppose x* G R™ lies on 
the hyperplanes Hj . Let x S K be such that 

(1) Each hyperplane Hj is a supporting hyperplane to some and C {x \ 
(aj,x) <bj}. 

(2) /, , //, A; . 

(3) maxj ||jEj — x\\ = L, 

(4) There is some e > such that — e < ^zg^p < for all j = 1, . . . , J. 

Let a be the smallest singular value of the matrix A £ W ixj , where the jth column 
of A is aj. Let S be span{a\ 1 . . . , aj}. Let a be such that 

||Af||oo,a < a||Af|| 2 , 2 for all M £ W ixj , (5.23) 
where \\M\\ M := sup„^ 11 jj^" . Then 

\\P S {x* - x)\\ < LeaaT 1 . (5.24) 
Proof. Since x* is in Hj, we have (aj,x*) — bj. By Conditions (3) and (4), we get 

— eL < —e\\x — Xj\\ < (aj, x — Xj) < 0. 
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Since (a,j,Xj) = bj = (a,j,x*), we have 

0<(a,j,x*-x)<eL. (5.25) 

By standard linear least squares, we have 

\\P s (x* - x)\\ 2 = \\A(A T A)- 1 A T (x* -x)\\ 2 

< \\A(A T A)- 1 \\ 00t2 \\A T {x*-x)\\ 00 

By (|5.25[) . we have ||A T (ir* — S)||oo < (L. Furthermore, using standard properties 
of the singular value decomposition, we have 

|| A(A T A)- 1 |U, 2 < a\\A(A T A)-%, 2 = aaT 1 . 

The required bound follows immediately. □ 

To apply Lemma 15.121 to Algorithm 15.11 note that Condition (3) follows from 
properties of the projection, while Condition (4) is an attempt to apply Theorem 
12.11 Lemma T5.12I is closer to the spirit of Theorem 14.61 However, the term er -1 is 
hard to control, so we have not had success in applying Lemma 15.121 so far. 

6. Infeasibility 

We now discuss the case where the K := njLilfj = 0. For any algorithm 
producing a sequence {xi} in the hope of converging to a limit x € K, there are 
three possibilities: 

(1) An infinite sequence cannot be produced because the intersection of the 
halfspaces is an empty set at some point. 

(2) The sequence {xi} contains a cluster point x. 

(3) The sequence {x{\ does not contain a cluster point x. 

We first show that case 2 is not possible for Algorithm 13.11 in the case of strong 
cluster points. 

Theorem 6.1. (No cluster point) For Alaorithm WJ\ using (|3.1bj) . in the case where 
K = 0, the sequence {xi} cannot contain a strong cluster point. 

Proof. Suppose on the contrary that {xi} contains a strong cluster point, say x. 
Since x ^ K, we assume without loss of generality that x ^ K\. Then let z := 
Pr 1 (x), and v — x — z. Let a x = x — Pr- 1 (x) and b x = (a x , Pk-l(x)). By elementary 
properties of the projection, we have (aj,x) > b x . The parameters a x and b x 
depend continuously on x. By the workings of Algorithm 13. 11 we have 

(a Xi ,x i+ i) < b Xi . 

As we take limits as i — > oo, we get (a x ,x) < b x . This is a contradiction. □ 

One can easily check that Case 3 can happen. Consider the sets K\ and K 2 
defined by 

Kt = {{x,y) e M 2 | y > e~ x }, 

and K 2 = {(x,y) eR 2 \ y <-e- x }. 

If xq is chosen to be the origin in Algorithm l3.il then the iterates Xi cannot converge 
to a limit by Theorem EUl and therefore must move in the direction of the positive 
x axis. We understand more about such behavior with the result below. 
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Theorem 6.2. (Recession directions) If{xi} is a sequence of iterates for Algorithm 
3.1\ using (|3.1bj) in the case where K = and X = WL n , then any cluster point of 
{w^ir} must lie in R{K{), the recession cone of K\, for all I = 1, . . . , r. 



Proof. Let {pH-if} be a subsequence of {jif^} which has a limit v. We show that 
such a limit has to lie in R(Ki). Seeking a contradiction, suppose that v ^ R(Ki). 

We show that there is a unit vector w € K" and M € M such that w T c < M 
for all c € Ki and w T v > 0. Take any point y 6 K~i. Since v ^ R(Ki), there is 
some 7 > such that y + € if;, but y + j'v ^ K\ for all 7' > 7. It follows that 
there exists a unit vector w £ iVjfj (y + 71;) such that w T t> > 0, and we can take 
M = w T (y + 7«). Since iy T w > 0, we shall assume that w T Xi > M for all i. 

Let Ci := PK t {xi), and let Ui be the unit vector in the direction of Xi — Ci. We 
write Xi — Ci = onUi- We have 

uj Ci = uf(xi — otiUi) 



Also 



T~ _ 
Ua Xi Oii 



T T ~ T 

CtiW Ui = W Xi — W Ci 



> w T Xi - M. 



Since aiW T Ui — w T (xi — c,) > M — M = 0, we have w T Ui > 0, and hence 
oti > w T Xi ~ M. Therefore, 



uTci < ujxi — w T Xi + M. 



By the workings of Algorithm 13. 1[ we have ufxi > ufci and ufxj < uj Ci for all 
j > i. This gives uj(xj — Xi) < 0, which gives ujv < 0. 

Let u be a cluster point of {ui}. We can consider subsequences so that lim^oo Ui 
exists. For any point c £ Ki, we have 



u T c = lim ufc 



< lim inf ufci 

i— >QO 

< liminf [ufxi — w T xt + M] 



liminf MA] [ uf-^-r - w T -^- ] + M 



= liminf ||xj || [ufv — w T v] + M 

= — OO, 

which is absurd. The contradiction gives v £ R(Ki). □ 

7. Conclusion 

In this paper, we focus on the theoretical properties of using supporting hyper- 
planes and quadratic programming to accelerate the method of alternating pro- 
jections and its variants. It appears that as long as a separating hyperplane is 
obtained for K and the quadratic programs are not too big, it is a good idea to 
solve the associated quadratic program to obtain better iterates. Other issues to 
consider in a practical implementation would be to either remove or combine loose 
constraints so that the size of the intermediate quadratic programs do not get too 
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big. The ideas in |Kiw95j for example can be useful. It remains to be seen whether 
the theoretical properties in this paper translate to effective algorithms in practice. 
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